Characterizing Open Addressing Hash Functions
نویسندگان
چکیده
In [1], we showed that different open addressing hash functions perform differently when the data elements are not uniformly distributed. So, it is tempting to attribute their difference to some mechanism governing the behavior of the hash functions. In this paper, a simple method of characterizing open addressing hash functions is presented. We showed that, indeed, the nature of data spreading ability characterizes the behavior of different open addressing hash functions. We measured and analyzed the spreading speed of a cluster of data elements under different open addressing hash functions. Our experimental results and theoretical analysis showed that different hash functions have different abilities of spreading out clustered data elements. The hash function, which spreads out clustered data elements over the whole table space more uniformly and faster, has better performance when it is applied to clustered data. Experimental results are presented to support our claims, which is followed by some theoretic analysis. 1 Hash function families In this paper we present hash functions in the form of nonlinear dynamical systems. So, first we derive dynamical system expressions for different hashing families. 1.1 Linear and quadratic hashing The family of linear hash functions can be expressed as HL(k, i) = (h(k) + c1i) mod m, (1) where h is an ordinary hash function, and i = 0, 1, . . . is the probe number. This technique is known as linear hashing because the argument of the modulus operator is linearly dependent on the probe number. In general, c1 needs to be chosen so that it is relatively prime to m if all slots in the hash table are to be examined by the probe sequence. In order to construct an equivalent transformation for equation (1), it must be rewritten as a recurrence relation so that it has explicit dependence on previous values of the iteration sequence. Since we have HL(k, i + 1) = (h(k) + c1i + c1) mod m, and using the fact that for a, b,m ∈ IR, (a + b) mod m = (a mod m + b mod m) mod m, (2) we may rewrite equation (1) as the following linear time-invariant first-order iterator HL(k, i + 1) = (HL(k, i) + (c1 mod m)) mod m HL(k, 0) = h(k). Note that the dependence on k is specified in the initial condition H(k, 0). Quadratic hashing is a simple extension of linear hashing that makes the probe sequence nonlinearly dependent on the probe number. For any ordinary hash function h, the family of quadratic hash functions is given by HQ(k, i) = (h(k) + c1i + c2i) mod m (3) where c1 and c2 are positive constants. Once again, the specific values chosen for the constants are critical to the performance of this method (see [2] for details). To obtain a recurrence relation solution to equation (3) we note that HQ(k, i + 1) = [(h(k) + c1i + c2i) +c1 + c2(2i + 1)] mod m which leads to the time-varying first-order recurrence relation HQ(k, i + 1) = (HQ(k, i) + c1 + c2(2i + 1)) mod m HQ(k, 0) = h(k). (4) 1.2 Double hashing family HD It is well known that linear and quadratic hashing strategies suffer from clustering, because both have probe sequence increments that are independent of the key. Double hashing remedies this problem by introducing a second hash function that is used in the computation of the increment. Given two hash functions g and h, the family HD of linear double hash functions is given by: HD(k, i) = (g(k) + ih(k)) mod m. (5) In this family, the initial probe HD(k, 0) = g(k), and successive probes are offset from previous probes by multiples of h(k) modulo m. Thus the probe sequence depends on k through both g and h, and is linear in g(k) and h(k). A widely used member of HD, proposed by Knuth [1973] , has g(k) = k mod m and h(k) = k mod (m− 2), where both m and m− 2 are prime. A recurrence relation for the family HD described by equation (5) is obtained by first noting that HD(k, i + 1) = [(g(k) + ih(k)) + h(k)] mod m,
منابع مشابه
Comparison of Different Open Addressing Hashing Algorithms
Hash functions are among the oldest and most widely used data structures in computer science. Different hash functions exist. So, it is very important to compare their performance. In this paper, we introduced our new hash function which was proposed recently in [1], and compared its performance with two different open addressing hashing algorithms: double hashing and exponential hashing. Doubl...
متن کاملHash-Binary Search: A Fast Technique for Searching an English Spelling Dictionary
When a document Is prepared using a computer system, it can be checked for spelling errors automatically and efficiently. This paper presents the hash-binary method for searching a static table and applies It to searching an English spelling dictionary. Analysis shows that with only a small amount of space beyond that required to store the keys, the hash-binary search method pei— forms better t...
متن کاملAn Exponential Open Hashing Function Based on Dynamical Systems Theory
In this paper an eecient open addressing hash function called exponential hashing is developed using concepts from dynamical systems theory and number theory. A comparison of exponential hashing versus a widely-used double hash function is performed using an analysis based on Lya-punov exponents and entropy. Proofs of optimal table parameter choices are provided for a number of hash functions. ...
متن کاملCHAP: Enabling Efficient Hardware-Based Multiple Hash Schemes for IP Lookup
Building a high performance IP lookup engine remains a challenge due to increasingly stringent throughput requirements and the growing size of IP tables. An emerging approach for IP lookup is the use of set associative memory architecture, which is basically a hardware implementation of an open addressing hash table with the property that each row of the hash table can be searched in one memory...
متن کاملAdvanced hashing schemes for packet forwarding using set associative memory architectures
Building a high performance IP packet forwarding (PF) engine remains a challenge due to increasingly stringent throughput requirements and the growing size of IP forwarding tables. The router has to match the incoming packet’s IP address against all entries in the forwarding table. The matching process has to be done at increasingly higher wire speed; hence, scalability and low power consumptio...
متن کامل